Meta-Learning for Simple Regret Minimization
نویسندگان
چکیده
We develop a meta-learning framework for simple regret minimization in bandits. In this framework, learning agent interacts with sequence of bandit tasks, which are sampled i.i.d. from an unknown prior distribution, and learns its meta-parameters to perform better on future tasks. propose the first Bayesian frequentist algorithms setting. The algorithm has access distribution over meta m tasks horizon n is mere O(m / √n). On other hand, O(n√m + m/ While worse, more general because it does not need meta-parameters. It can also be analyzed settings. instantiate our several classes problems. Our we complement theory by evaluating them empirically environments.
منابع مشابه
4 Learning , Regret minimization , and Equilibria
Many situations involve repeatedly making decisions in an uncertain environment: for instance, deciding what route to drive to work each day, or repeated play of a game against an opponent with an unknown strategy. In this chapter we describe learning algorithms with strong guarantees for settings of this type, along with connections to game-theoretic equilibria when all players in a system are...
متن کاملRegret Minimization for Partially Observable Deep Reinforcement Learning
Deep reinforcement learning algorithms that estimate state and state-action value functions have been shown to be effective in a variety of challenging domains, including learning control strategies from raw image pixels. However, algorithms that estimate state and state-action value functions typically assume a fully observed state and must compensate for partial or non-Markovian observations ...
متن کاملRegret Minimization for Branching Experts
We study regret minimization bounds in which the dependence on the number of experts is replaced by measures of the realized complexity of the expert class. The measures we consider are defined in retrospect given the realized losses. We concentrate on two interesting cases. In the first, our measure of complexity is the number of different “leading experts”, namely, experts that were best at s...
متن کاملEfficient Constrained Regret Minimization
Online learning constitutes a mathematical and compelling framework to analyze sequential decision making problems in adversarial environments. The learner repeatedly chooses an action, the environment responds with an outcome, and then the learner receives a reward for the played action. The goal of the learner is to maximize his total reward. However, there are situations in which, in additio...
متن کاملRegret Minimization Algorithms for Pricing Lookback Options
In this work, we extend the applicability of regret minimization to pricing financial instruments, following the work of [10]. More specifically, we consider pricing a type of exotic option called a fixed-strike lookback call option. A fixed-strike lookback call option has a known expiration time, at which the option holder has the right to receive the difference between the maximal price of a ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Proceedings of the ... AAAI Conference on Artificial Intelligence
سال: 2023
ISSN: ['2159-5399', '2374-3468']
DOI: https://doi.org/10.1609/aaai.v37i6.25823